#supervised reinforcement learning01/11/2025
SRL: Teaching 7B Models to Reason Step-by-Step on Hard Math and Code
SRL converts expert trajectories into per-step rewarded actions and lets models produce private reasoning spans before each action, giving dense learning signals that boost 7B open models on hard math and coding tasks